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Abstract. Entropy rate is a real valued functional on the space of discrete ran- 
dom sources for which it exists. However, it lacks existence proofs and/or closed 
formulas even for classes of random sources which have intuitive parameteriza- 
tions. A good way to overcome this problem is to examine its analytic properties 
relative to some reasonable topology. A canonical choice of a topology is that 
of the norm of total variation as it immediately arises with the idea of a discrete 
random source as a probability measure on sequence space. It is shown that both 
upper and lower entropy rate, hence entropy rate itself if it exists, are Lipschitzian 
relative to this topology, which, by well known facts, is close to differentiability. 
An application of this theorem leads to a simple and elementary proof of the ex- 
istence of entropy rate of random sources with finite evolution dimension. This 
class of sources encompasses arbitrary hidden Markov sources and quantum ran- 
dom walks. 

Keywords. Analytic properties, discrete random source, entropy rate, evolution 
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1 Introduction 

Entropy rate is a key quantity in information theory as it is equal to the average amount 
of information per symbol of discrete-time, discrete-valued stochastic processes (usu- 
ally referred to as discrete random sources in the following). Therefore, it is natural 
to ask how entropy rate behaves if knowledge of discrete random sources is subject to 
uncertainties which, for example, may be inherent to inference processes and/or origi- 
nate from noisy channels. However, closed formulas for entropy rate exist only for rare 
examples of classes of discrete random sources. For instance, already hidden Markov 
sources (HMSs) seem to defy a convenient formula although there is one for the spe- 
cial case of Markov sources. Therefore, in this case, recent efforts focused on the direct 
investigation of analytic properties of entropy rate like smoothness or even analyticity 

E5EH . gn5DI .E51.fla. 

The purpose of this paper is to contribute to the issue of analytic properties of en- 
tropy rate in a more general fashion. Namely, we study the behavior of entropy rate 
relative to the topology induced by the norm of total variation. This topology is one of 
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the natural choices and it is ubiquitous in both theoretical and practical work. We show 
that entropy rate is Lipschitzian on the whole space of discrete random sources which 
is, due to an elementary theorem of Rademacher, close to differentiability. 

We will use this result to give an elementary proof of the existence of entropy rate 
for sources with finite evolution dimension |6| which contain the classes of arbitrary 
HMSs AH and quantum random walks (QRWs)fl], 0. 

The paper is organized as follows. We will identify discrete random sources with 
probability measures acting on the measurable space of symbol sequences equipped 
with the (T-algebra generated by the cylinder sets of sequences. Therefore, in section 
|2] we will briefly compile the theory's standard arguments. In section[3]we prove that 
entropy rate is Lipschitz continuous relative to the topology induced by the norm of 
total variation which is the main contribution of this paper. In section|4]we demonstrate 
how to exploit this result for an elementary proof of existence of random sources with 
finite evolution dimension which include HMSs and QRWs as special cases. In section 
[5]we will describe the proof's intuition thereby commenting on open problems such as 
other choices of topology and/or stricter choices of analytic properties. 

2 Random sources and entropy rate 

As usual, S* — Ut>oi?* is the set of all words (strings of finite length) over the finite 
alphabet S together with the concatenation operation 

ve s\weS s =>• we £ t+s . (1) 

Throughout this paper J? = S N — ®t^o ^ * s ^ e set °^ se 1 uences over ar, d <B is 
the (T-algebra generated by the cylinder sets. Cylinder sets B are identified with sets of 
words As C £ l such that B is the set of sequences which start with the words in As- 
In general, the cardinality of a set A is denoted by |^4|. 

We view stochastic processes (X t ) te ^ with values in £ as probability measures Px 
on the measurable space (J?, £>) and vice versa via the relationship (v = VQ...Vt~i E 
corresponds to the cylinder set of sequences having v as prefix) 

P x {v) = P({X = vo, X, = v u ...,X t _! = vt-i}), (2) 

where the term on the right hand side is the probability that the random source emits 
the symbols Vq, «t-i at periods 0, t — 1. Note that a stochastic process (X t ) is 
uniquely determined by the values Px(v) for all v G U* as the cylinder sets corre- 
sponding to words v generate B 

Although being a canonical choice of norm (see appendix [A] for a short review of 
the related theory and corresponding definitions), computation of the norm of total vari- 
ation would not be easy for the measurable space under consideration by means of its 
original definition alone. The following lemma shows a concrete way to get a grip of 
the corresponding topology. Exact definition and basic properties of the norm of total 
variation have been deferred to appendixlAl 
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Lemma 1. The topology induced by the norm of total variation is that of the metric 




where Px , Py are probability measures associated to discrete random sources (X t ) , (Y t ). 

Proof. See sec. |A.2| of the appendix for the predominantly measure theoretical ar- 
guments, o 



2.1 Entropy Rate 

In the following, we will refer to the quantities 

H(X) := H{Px) := lim sup H l {Px ) (4) 

t — *oo 

resp. H(X) := H{P X ) := liminf H^Px) (5) 

t — >oo 

as upper entropy rate resp. lower entropy rate of a random source (X t ) with associated 
probability measure Px, where, using the language introduced above, 

H\P X ) := -\ ]T Px(v) log P x (v) (6) 

is the entropy of the distribution over the words of length t induced by the random 
source, divided by t. Entropy rate of a random source (X t ) with associated probability 
measure Px is denoted by 

H(X) := H(P X ) := lim H\P X ). (7) 

t — ►oc 

The existence of the limit of the H t (Px) is also referred to as the existence of entropy 
rate where, obviously, a necessary and sufficient condition for entropy rate to exist is 

H(X)=H(X) (=H(X)). (8) 

Throughout this paper, Z\ n_1 = {x = (xi,...,x n ) £ R™ \ xi > 0, J2i x i = 1} i s me 
usual regular n — 1-dimensional simplex in R" and, for technical convencience, log is 
the natural logarithm. Note that, as it is more common to use the logarithm to the base 
2, switching bases does not affect any analytic property of entropy rate. 



3 Analytic properties of entropy rate 

Our main result is the following theorem, which states that entropy rate is Lipschitz 
continuous with respect to the topology induced by the norm of total variation. In the 
following let V be the set of the probability measures associated with discrete random 
sources, viewed as a normed space. Elements of V will be denoted by P or Q. We 
further denote the normed subspace of discrete random sources for which entropy rate 
exists by Vh ■ 
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Theorem 1 (Lipschitz continuity of entropy rate). The real-valued functionate H 
and H_on V are Lipschitzian with Lip(H) — Lip(H) — log \S\, that is, for P,Q G V, 

\H{P)-H{Q)\ < (log \S\)d TV (P,Q) (9) 
\H{P)-H{Q)\ < (log \U\)d TV (P,Q). (10) 

Clearly, because of ©, a corollary of the theorem is that the same holds true for 
entropy rate itself. 

Corollary 1. Entropy rate is Lipschitzian with Lip(H) — log \S\, that is, 

\H(P)-H(Q)\ < (log \S\)drv(P,Q) (11) 
where, here, P,QG Vh- 

We present two lemmata, which incorporate the essential ideas of the proof of the 
theorem. We write 

drv^PiQ) ■= Yl \ P ( V )-Q( V )\- ( 12 ) 

vest 

Lemma[T]says that lim^oo drv,t(P, Q) — d,Tv{P, Q)- Note that drvj is not a metric 
onP. 

Lemma 2. Let P, Q e V such that drviP, Q) — ■ Then it holds that 

\H\P) - H\Q)\ < (log \S\ + - log ■ d TV ,t(P, Q), 

t UTV,t{-r,W) 

where • log oo :— in case of dTv,t{P, Q) = 0. 

For the proof of this lemma we will need a technical sublemma. 



Sublemma 31 Let h(x) :— x\og(l/x) for x G]0, 1] and h(0) = 0. Then, for x,y G 
[0,1], 

\x-y\<- =^ \h(x)-h(y)\<h(\x-y\). (13) 

e 

Proof. Note first that h'(x) — log — — 1 and h"(x) = —-. Hence h is concave, has 
a global maximum at - and h(^) = -. Therefore x < h(x) x < - (*). Because of 

\h(x)-h(y)\ = \\h(x)-h(-)\-\h(-)-h(y)\\ 

e e (14) 

<max{|/ l (o;)-/ l (-)|,|/ l (-)-%)|} 

e e 
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and the fact that h is monotonically increasing on [0, -] we can, without loss of gener- 
ality, assume that either x, y > - or x. y < -. Because of \h'(x)\ < 1 on [-, 1] and the 
mean value theorem, it holds that 



< x,y < 1 



\h{x)-h{y)\ < \h'(x)\\x-y\ = \x - y\. 



(15) 



Because of (*) we obtain the claim for the case - < x, y < 1. 

It remains the case (w.l.o.g. x < y) x < y < -. Here it holds that \h(x) — h(y)\ — 
h(y)—h(x). We note that the function log j — 1 is positive and monotonically decreasing 
on [0, -] (**). We obtain the claim from the calculation 



\h(x)-h(y)\ 



rv i (**) rv i 

/ (log--l)dt < / (log— --l)dt 



(log 



l)ds 



slog- 



h(y - x). 



(16) 



Let now Z\™ _1 := K ■ A n ~ x = {a; = (a*, ...,a; n ) S M" | ^ > 0,^^ = K}. 
In a way that is completely analogous to that of showing that entropy attains a maxi- 
mum at uniform distributions we infer that, on A^T , the function liK,n{%i, %n) := 
127=i Xi (a scaled version of entropy) attains a global maximum at 5 := (K/n, if/n) 

(* * *). 

We are now able to prove lemma|2] 

Proof. Obviously H\P) = H\Q) in case of d T v, t ( p ,Q) = °- In case of 
drv.AAQ) > 

\H\P)-H\Q)\ 

t ^ P(t>) Q(«) 

S«bi.[3T] i l 

< 7 E TO-Q(«)|log 



t^' W Wl l-P(«) -Q(«)l (17) 
(***) 1 v d TV , t (P,Q) \S\ l 
- t v j^ \s\* og d T V,t(P,Q) 

= ld TV , t (P,Q)(tlog\E\+log ' 



To get control of the limes superior resp. inferior involved in the definition of en- 
tropy rate we will further need the following lemma. 
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Lemma 3. Let (at) and (fit) two non-negative real valued sequences such that 



\at — bt\ < Ct and lim c t — c. (18) 

t — >00 



77zen it holds that 



| lim sup at — lim sup b t | < c (19) 

£ — >oo t — >oo 

I lim inf a* — lim inf bt \ < a (20) 

t — >oo t — >oo 

Proof. We only display the proof for ( fT9T > as that of ( f20b can be obtained, mutatis 
mutandis, by analogous considerations. 

W.l.o.g. assume a := lim sup at > lim sup bt =: £>. Choose a subsequence fc(t) 
such that lim^oo au t \ = a. We obtain 

a — b < a — limsup6 fc ( 4 ) = limsupa^j — limsup6j.( t ) 

t — >oc t — >oo t — >oc (21) 

< lim sup \a k ( t ) - &k(t)| < c. 



We are now in position to prove theoremQ] 

Proof. As Lipschitz continuity is a local property, we can assume that drv (P, Q) < 
~. Setting a t := H l (P) and b t := H\Q) we obtain by lemma|2] 

|o*- 6*| < drv, t (P,Q)(log |27| + - log =: c t . (22) 

t aTv,t{^w) 

The definition of dxv,t and lemma[T]lead to 

lim a = lim drv,t( P > <3)( lo g 1^1 + 7 lo S 3 77J7Ti) 

t^oo t-foo i d TV j(P,Q) (23) 

Plugging (at), (bt) and (c*) into lemma[3]then yields the desired result. o 

In order to elucidate that the structure of the proof strongly depends on the choice 
of the norm we rephrase lemma [2] in a more general fashion, without the "soul" of an 
entropy. Therefore let 

1 ™ 1 

h„(xi,...,x n ) = VV-log— (24) 

log n f—J Xi 

on Zi™^ 1 where n > 2 and Olog oo := 0. A more prosaic version of lemma|2]then 
reads 

\h n (x) -h n (y)\ < \\x-yWx • (1 + —!— log - — ), (25) 

log n \\x — y\\i 
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where = J2i \ x i \ as usual. A straightforward consequence of the lemma is 

Ve e M + 35 e M + Vn > 2 Vx, y e ZT" 1 : 

||x-y||i<(5 => |/i„(x) - /i„(y)| < e. (26) 

After being translated back to entropies, this states that entropy rate is uniformly con- 
tinuous on V. We note that the statement of the generalized lemma need not be true 
relative to norms 1 1 .| \ p different from 1 1 . 1 1 1 . More formally: 



Lemma 4. Let 2 < p < oo and \ \x\\ p = \ x i\ p tne usu °l p-norm on M™. Then it 

holds that 

3e e R+ V<5 e K+ 3N > 2 3x, y e A N ^ : \\x - y\\ p < 5, \h N (x) - h N (y)\ > e 
which is just the negation offi26)l. 

For the proof we use the notation (0 < m < n) 

<„:=(-,..., -,0,...,0)GZi"- 1 . (27) 

m times 

Proof. Choose e — 1/2 and S G M + arbitrarily. Choose an m E N, such that 
m > i. Then find an iV > m, such that ||a;* n || 2 = (^■) 1 ^ 2 < 5 f° r every n > 7Y . 
Further 



< IK,„IIp = (^ft)* = n Ppl 

np" 1 < = ||(_ ) I)|| 2 < S, 
n n 



(28) 



but 



\K{x* mn )-h n {x* )\ = - |logm-logn| — ► 1. (29) 

log n 



n — »oo 

liV-1 



Therefore, we find an N 6 N and suitable x,y E A which support the statement 
of the lemma. o 

Remark Because of lemma [4] one could intuitively be led to the assumption that 
entropy rate need not be continuous with respect to the norms given through the spaces 
L p (f2,B,P),p > 2 . However, this is not true, see fl27l for respective considerations. 



4 Entropy rate of sources with finite evolution dimension 

In the following we will give a direct proof of the existence of entropy rate of sources 
with finite evolution dimension which had been introduced in [6|. See the subsequent 
subsection |4.3| for prevalent examples of random sources of finite evolution dimension. 

As sources with finite evolution dimension are asymptotically mean stationary [6|, 
the result can be obtained as a corollary of the theorem of Shannon-McMillan-Breiman 
for asymptotically mean stationary sources [9 |. However, the following proof is much 
simpler. See subsection l4.4l for a detailed comparison of the two proofs. 
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4.1 Preliminaries 

In the following let the shift operator T : fl — > Q be defined by 

T(vqVxV-2—) ■= ViV 2 --- ■ (30) 

Obviously, T is measurable. If (X t ) is a discrete random source with associated mea- 
sure Px then 

{P x oT- k ){v):= p x(wv) (31) 

gives rise to a probability measure PoT~ k which is associated with the discrete random 
source ((Xfc) t ) defined through 

Px h ({(X k )o=vo,(X k )i=vi,...,(X k )t- 1 =v t -i}) 

■= Px({X k = v Q ,X k+1 = vi, ...,X t -i+k = v t-i})- 

A discrete random source (X t ) is said to be of finite evolution dimension if the family 
(Px ° T~ k )k>o spans a finite-dimensional subspace in the linear space of finite, signed 
measures on (Q, B) (see appendix lAlfor the definition of a finite, signed measure). 
In the following we will write 

^ n— 1 

PT -t :=Po T -% and Pn . = _ PT -t (3 2) 

n z -~ / 

i=0 

for probability measures P associated with random sources. 



Theorem 2. If P is a discrete random source of finite evolution dimension there is a 
stationary discrete random source P, called the stationary mean of P such that 

lim drv{Pn,P) = 0. (33) 

n — >oo 

Proof. The proof is centered on an elementary fact from linear algebra. As it re- 
quires some of the basic theory of finite, signed measures, we have deferred it to sec. IA.3l 
in the appendix. Note that an alternative, slightly more complicated version of the proof 
has already been given in (6). o 



4.2 Proof for the existence of entropy rate 

In order to be prepared for the proof we provide a lemma whose immediate consequence 
is that entropy rate coincides for all P n , n > 0. 

Lemma 5. Let P be a probability measure associated with a random source. Then it 
holds that 

VneN: lim (ff'(P) - = 0. (34) 



On analytic properties of entropy rate 



9 



Proof. A straightforward consequence of Lemma 2.3.4, 1 1 1 is that for a £ [0, 1] 
and probability measures P, Q: 



aH*(P) + (1 - a)H\Q) < P*(aP + (1 - a)Q) 

< aP'(P) + (1 -a)H\Q) 
Now, by induction on n, 

1 n— 1 ^ n— 1 

- V HHPT- 1 ) < H\P n ) < - V H t (PT~ i ) + - log 2 
n ^— ' n ^-^ f 

i=0 i=0 

and the assertion follows from lemma 171 (appendix [Bl> which states that the i?*(PT _1 ) 
conincide for all i > 0. o 

We establish that both upper entropy rate H and lower entropy rate H_ coincide for 
all P„,n> 0. 

Corollary 2. Lef P be the probability measure associated with a random source. Then 
it holds that 

H(P) = H(P n ) and H(P)=H(P n ) (35) 

for all n e N. 

Proof. Use lemma[5]in order to apply lemma[3]to the sequences (at := ff*(P)), (&t := 
iP(P„)) for the first equation. For the second one rephrase lemma [3] with liminf in- 
stead of lim sup. o 

As a consequence, we can prove the existence of entropy rate for finite-evolution- 
dimensional sources. 

Theorem 3 (Existence of entropy rate). Let P be a probability measure associated 
with a random source of finite evolution dimension. Let P be the stationary mean of P. 
Then it holds that 

H{P) =H(P) = lim P*(P). (36) 

t — >oo 

Therefore, entropy rate of P exists. Moreover, it is equal to the one of the stationary 
mean P. 

Proof. As the P n converge in TV-norm to P (theorem [2]) we obtain due to the 
continuity of P, P (theorem[T]) 

lim P(P„) = H(P) and lim H(P n ) = H(P). (37) 

n — >oc n — >oo 

It follows, as H(P n ) and P(P n ) are constant with respect to n (corollary |2]i and 
P(P) = P(P) (as entropy rate exists for stationary sources) that P(P) = H(P) = 
H(P). o 

Remark Theorem [2] can be generalized to general asyptotically mean stationary 
(AMS) sources (see [9 | for the theory of AMS sources). However, the proof needs a 
sophisticated ergodic theorem, thereby loosing the elementary flavour | 
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4.3 Examples of discrete random sources of finite evolution dimension 

In the following, we present two classes of discrete random sources that have finite 
evolution dimension. 

Hidden Markov Sources (HMSs) Hidden Markov sources (HMSs) are the discrete ran- 
dom sources associated with hidden Markov models (HMMs) (also termed hidden 
Markov chains in the related literature). HMSs have been largely studied, see e.g. \24\ 
for a comprehensive review. In the following, we will give a brief definition of HMMs. 

An HMM M = (E, S, n, A, E) is specified by a finite set of output symbols S, a 
set of hidden states S = {1, n}, a transition probability matrix A = (Aij)i,j e s G 
E™ xn , an initial probability distribution it G M™ and an emission probability matrix 
E = (Ei V )i£s,v££ S R nx2: . It gives rise to a discrete random source pm with values 
in the finite set 2J, referred to as hidden Markov source (HMS) by the idea of changing 
hidden states according to the transition probabilities Ay = P(i — » j), where the 
first state is picked according to it, and emitting symbols from the hidden states, as 
specified by the emission probabilities E- la = P(a is emitted from i). More formally, 
in accordance with (O, 

p M (v = vi...v t ) = ^ Tr(ii)E ilVl A ili2 Ei 2V2 ■ . . . ■ A it _ lit E itVt . (38) 

In the literature, HMSs are often introduced as being induced by finite functions of 
Markov chains where emission probability distributions are replaced by a finite function 
/ : S — > E mapping hidden states to output symbols. It is straightforward to see that 
they give rise to complete class of HMSs as well. 

It is well known that HMSs have finite dimension or, equivalently, have finite degree 
of freedom. See [13 ] for an early work on the topic and [ 14 1 for further related work. 
The relationship of finite dimension and finite evolution dimension has been thoroughly 
discussed in [ 6 1 . It holds that finite evolution dimension is a necessary condition of finite 
dimension, which establishes that HMSs are of finite evolution dimension. Examples for 
which the generalization of the existence of entropy rate of sources with finite evolution 
dimension apply are non-stationary HMSs. A simple example for this might be a binary- 
valued source (i.e. E = {0, 1}) induced by a "circular" HMM acting on three hidden 
states S := {1, 2, 3} with transition resp. emissionn probability matrix 



"0 1 0" 




" 1 " 


1 


resp. 


0.5 0.5 


1 




1 



Clearly, this source is not stationary such that the simple existence proof for stationary 
sources does not apply. However, as an HMS, this source is of finite evolution dimension 
such that theorem[3]ensures the existence of its entropy rate. 

See the subsequent sec. |4.4| for a comparison of available proofs of the existence of 
entropy rate. 

Remark Related work on analytic properties of entropy rate of HMSs is con- 
cerned with topologies referring to the parameterizations of the HMMs giving rise to the 
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HMSs, that is, with the natural topologies of real-valued vector spaces (e.g. 112012 11221 ). 
For example, in the special case of binary valued i.i.d. processes, emitting values from 
{0, 1}, entropy rate is computed as 

plog(-) (40) 
P 

where the only parameter p is the probability that the binary valued i.i.d. process emits a 
1. Clearly, p log (1/p) is not Lipschitz continuous in intervals around zero [(plog(l/p))' = 
log 1/p — 1]. However, this does not contradict theorem[T]as convergence w.r.t. the pa- 
rameterization does not imply convergence w.r.t. the norm of total variation, which we 
will briefly outline in the following. 

As follows from elementary measure theoretical considerations, the topology in- 
duced by the norm of total variation is equivalent to that of the general version of the 
metric of total variation 

D TV {P,Q) := sup \P(B)-Q(B)\ (41) 

where P,Q G V are two probability measures acting on the measurable space (f2, B). 
In the case of the measurable sequence spaces under consideration here, the equivalence 
of the topologies of the metric and the norm of total variation can be seen by lemma 
Q] as it follows from straightforward elementary computations that the topologies of 
dry of lemma Q] and the metric of total variation Dtv of 6D are equivalent. As a 
consequence, convergence in the sense of the norm of total variation is equivalent to 
uniform convergence on all measurable sets, that is, 

lim ||P-PJ| rv = lim sup \P(B) - P n (B)\ = (42) 

n— »oc n-^-oc 

where P, P n G V. 

However, as outlined in [20|, sec. VIII, convergence of probability measures in- 
duced by hidden Markov models whose parameterizations converge may not even be 
strong (see [ 15 1 for definitions and characterizations of several forms of convergence of 
probability measures) meaning that there might exist a set B* G B for which 

limsup \P{B*) - P„(B*)\ > (43) 

n — >oc 

where the P n ,n = 0, 1, ... are hidden Markov models whose parameterizations con- 
verge to the parameterization of P. According to d42l . this means that convergence in 
terms of the parameterization does not necessarily imply convergence w.r.t. the norm of 
total variation. 

Quantum Random Walks (QRWs) Quantum random walks (QRWs) were introduced 
to quantum information theory in 2001 as an analogon to classical Markov sources 
(TJ. For example, they allow to emulate Markov Chain Monte Carlo approaches on 
quantum computers. However, their properties are much less understood. A QRW Q = 
(G, U, ipo), in a very general form (see [1] for the full range of definitions), is specified 
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by a directed, i^-regular graph G = (V, E), a unitary {evolution) operator U : C N — > 
and a wave function ipo G (i.e. ||^>o|| = 1 f° r I Ml the Euclidean norm) where 
N := K ■ \V\ = \E\. Dimensions are labeled by edges which in turn are labeled by 
(u, x) where u G V and x G X, \X\ = K and C N is considered to be spanned by the 
orthonormal basis (e^ U7X ))^ u ^ e vxx=E- A QRW induces a classical random source 
Pq with values in £ := V (i.e. the set of nodes) by the following iterative procedure. 
In the first step, the evolution operator is applied to the initial wave function -0 O , and 
the resulting wave function Utpo, with probability J2 x ex \(U'^o)(u 1 .x)\ 2 y is collapsed 
(i.e. projected and renormalized, which models a quantum mechanical measurement) to 
the subspace of C^, spanned by the vectors e( Ul x G X that is associated with (the 
edges leaving from) node u\, thereby generating the first symbol u\. This procedure 
results in a new wave function describing the state ip Ul the QRW is in after having 
generated the first symbol u\. In order to generate a second symbol U is applied to 
ip Ul , and Utp Ul is, with probability J^xex K^VvJo^)! 2 , collapsed to state ip UlU2 , 
thereby generating the second symbol U2- Iterative application of this basic procedure 
of evolving followed by collapsing yields a sequence of symbols. See [1 1 for further 
details. 

A concise formal description in terms of formula analogous to ( f38l > of the discrete 
random source pq along with a proof of QRWs being of finite dimension has been 
presented in ESI . Finite evolution dimension follows from finite dimension, which, as 
outlined above, has been thoroughly discussed in [6 |. 



4.4 Comparison of existence proofs of entropy rate 

The result of theorem[3]for the special case of HMSs can be obtained as a combination 
of the Shannon-McMillan-Breiman (SMB) theorem for asymptotically mean stationary 
(AMS) sources and the fact that HMSs are AMS Q3D (see also Ej] for a com- 
prehensive review of theoretical results on HMSs). Therefore, the existence of entropy 
rate for arbitrary, stationary and non-stationary, HMSs has theoretically been known 
since 198 1 . For QWRs the result has been known since 2006, implied by combining the 
results of |9| and J3] in the same fashion as for HMSs. However, even for HMSs, the 
result seems to be rather unnoticed which might be due to both the complex nature of its 
proof and that the necessary combination of results has not been explicitly mentioned. 
The SMB theorem in this most generalized version is centered around a proof for the 
class of ergodic, stationary random sources [29 23 3 4 1 which requires involved ergodic 
theorems. The extension to general stationary sources B2I16I17I . in an exemplary (and 
elegant) version, needs the sophisticated concept of the ergodic decomposition of sta- 
tionary random sources [8|. The final step (9j requires again a collection of non-trivial 
theorems as a prerequisite. 

The proof given here is substantially simpler from two main aspects. First, it is cen- 
tered around the standard elementary proof of the existence of entropy rate of stationary 
sources. Note that, this way, we do not even need to introduce ergodicity. Second, the 
extension to non-stationary classes of random sources is done by results of exclusively 
elementary nature (theorems 12 [T]). 
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5 Conclusion 

We show that entropy rate is Lipschitzian relative to the topology of total variation in 
an elementary fashion. Besides from providing a comparatively simple existence proof 
for HMSs and QRWs, this helps getting a more general grip of entropy rate. Moreover, 
it brings up some interesting open questions: 

- A first open question which immediately arises is whether our arguments can be 
strengthened to stricter analytic properties. A first clue is that the definition of en- 
tropy rate as well as theorem[T]can be consistently extended to the whole real vector 
space of finite, signed measures. Rademacher's theorem |7| states that Lipschitzian 
functionals on finite-dimensional real vector spaces are differentiable almost every- 
where w.r.t. the Lebesgue measure on the Borel-sets. This points at that entropy rate 
is close to being differentiable and, so far, we have not succeeded in constructing a 
random source at which entropy rate is not differentiable. 

- The intuition behind our proof is that entropy rate cannot differ too much if sets 
of typical sequences of two sources overlap to a sufficiently high degree. However, 
it seems to be obvious that entropy rate is continuous when considering it relative 
to sizes of sets of typical sequences which is a more general assumption. A cor- 
responding result would certainly be applicable to coarser topologies as, say, the 
weak topology. So far, it has only been known that entropy rate, as a functional 
on the set of stationary sources only, is upper semicontinuous [ 1 1 relative to the 
weak topology. We believe that theorems of the quality of theoremQ] based on the 
comparison of sizes of sets of typical sequences, will greatly improve such results. 

A The norm of total variation 

In the following, let A U B be the disjoint union of two sets A and B and ZA be the 
complement of a set A. 

A.l Finite signed measures 

A finite, signed measure on (f2, B) is a c-additive but not necesarily positive, finite 
set function on B. The most important relevant properties of finite signed measures are 
summarized in the following theorem (see ifTTl . ch. VI for proofs). 

Theorem 4. 

1. By eventwise addition and scalar multiplication, the set of finite signed measures 
can be considered as a real-valued vector space. 

2. The Jordan decomposition theorem states that for every P G V there are finite 
measures P + , P_ such that 

P = P + - P_ (44) 

and for all other decompositions P = P\ — P-i with measures P\ , P2 it holds that 
Pi = P + + 5, P2 = P- + 5 for another measure S. In this sense, P+ and P- are 
unique and called positive resp. negative variation. The measure \P\ := P + + P_ 
is called total variation. 



14 A. Schonhuth 



3. In parallel to the Jordan decomposition we have the Hahn decomposition of ft into 
two disjoint events J?+ , /?_ 

Q = Q+ U f2- (45) 

such that P_(i2 + ) = and P + (/2_) = 0. Q+, J?_ are uniquely determined up to 
\P\-null-sets. 

4. The norm of total variation | . | \t v on V is given by 

\\P\\tv := \P\(n) = P+(Q) + P-(Q) = P+(n+) + P-(Q-). (46) 
Obviously \\ \P\ \\ TV = \ \P\\tv- 



A.2 Proof of lemma Q] 

For the proof, we will identify cylinder sets B £ B with sets of words As G 2* as 
usual (B is the set of sequences which are the continuations of the words in Ab)- In our 
notation, we correspondingly obtain 

P(B)= ]T P+(v)-P-(v) (47) 

vEAb 

for a signed measure P with Jordan decomposition P = P + — P_. We will further 
make use of the approximation theorem (see Halmos ifTTl . p. 56, Th. D) which tells 
that, given a measure P, an event B € B and e G R + , we find a cylinder set F such that 

P(B A F) < e, (48) 

where BAF = (B \ F) U (F \ B) is the symmetric set difference. A straightforward 
consequence of this is that \P(B) — P{F)\ < e. 



Proof. It suffices to show 

ILPllrv =sup V \P(v)\ = lim V \P(v)\. (49) 
*6N Tit Tit 

for an arbitrary finite, signed measure P. 

The second equation of (O now follows immediately from 

E = E i E p Mi ^ E E i p Mi = E i p (»)i < 5 °) 

=|P(«)| 

which shows that (Y^ves* \P{ v )\)t&* * s a rnonotonically increasing sequence. It re- 
mains to show that it converges to ||P||jv. This translates to demonstrate that, given 
there is T e N with 

\P(v)\>\\P\\ TV -e. (51) 

ves T o 
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Therefore let P + , P_ be the Jordan decomposition of P and, correspondingly, fl = 
/2+U Q- be the Hahn decomposition. By an application of the approximation theorem 
(see above) we find To € N and a cylinder set corresponding to A C S Ta with 

\P\(n + AA)<^ (52) 

a straightforward (|P| = P + + P_) consequence of which is that both 

P+(I2+ AA)<| and P-(Q+ A A) < | (53) 

Now note that the obvious CA A CP = A A P in combination with J?_ = Ci7 + and 
d53l yields 

P_(J?_ A CA) = P(J?+ A 4) < |. (54) 
d53l and d54l i then yield the inequalities 

P+{ZA) p +^r>=° P+( ^ + \ ^ < P+( ^ + AA)<J (55) 

and 

P_(A) P - (f L +)=0 P_(X2_ \ CA) < P_(!7_ A CA) < -. (56) 
Moreover, it is straightforward from d53l l and d54l i that 

P+(A) > P+{Q+) - | and P_(CA) > P_(/2_) - |. (57) 
We finally compute 

E ™i = Ei p wi+ E 

>|P(A)| + |P(U)| 

>P + (A)-P_(A)+P_(U)-P+(CA) ( 58) 



^ (P + (fl+) - 7) - 7 + (P_(/2_) - 7) - 7 
P+ (12+) + P_ - e = I |P| | rv - e. 



A.3 Proof of theorem [2] 

We start with the following lemma. 
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Lemma 6. Let P be a finite signed measure on (0, B) and T : — > a measurable 
function. Then P o T _1 is a finite signed measure for which 

IPoT^KB) < {PKT^B) (59) 

for all B 6 B. In particular, 

WPoT-'Wtv <\\P\\ TV . (60) 



Proof. Note that P o T^ 1 = P + o T _1 — P_ o T _1 is a decomposition into a 
difference of measures. Because of the uniqueness property of the Jordan decompo- 
sition (see th. HJ, there is a measure 8 such that P + o T _1 = (P o T~ 1 ) + + 8 and 
P_ o T- 1 = (P o T _1 )_ + 6. Therefore \P o T^^B) = (P o T- 1 )+(S) + (P o 
T~ 1 )^{B) < P+(T- 1 B) +P_(T- 1 B) = IPKT^B). B = yields the last asser- 
tion, as T _1 = 0. 

Proof of Th. [2] We recall that, in th. [2] T was supposed to be the shift operator, 
which is measurable. We observe that 

fiP := (For 1 ) (61) 

establishes a linear operator on the vector space of finite, signed measures. Due to 
lemma [6] d60l >. it holds that ||/xP||<ry < ||P||tv f° r a U finite signed measures P, 
which establishes 

|| M ||<1 (62) 

where 1 1 . | is the operator norm associated with the norm of total variation. 

Now consider the subspace Vp of the finite signed measures spanned by all P o 
T~ l ,i G N for a given finite signed measure. Note that an equivalent description of 
finite evolution dimension is just 

dimP P < oo. (63) 

Note further that 

H{V P ) C V P . (64) 

The elementary, linear algebraic lemma 3.2 in |6'| states that, given an endomorphism 
F : V — > V on a finite-dimensional real- or complex-valued vector space V with 
I |P| I < 1> for a ll x G V there is an P-invariant x 6 V such that 

n — 1 

lim ||- V F k x-x\\ =0. (65) 

n — >oo Tl — ' 

As all norms are equivalent on V, this applies for arbitrary choices of norms ||.||. Re- 
placing V by Vp, 1 1 . 1 1 by 1 1 . | \pv , F by /i and x by P concludes the proof of theorem^ o 
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B Proof of lemma H 

Lemma 7. Let P be a discrete random source. Then it holds that 



lim(iJ'(P) -H t (PoT- k )) = 0. (66) 

t — ►OC 

Proof. Using the notation 

and 

J k (P) := 7 E E ^Miog iS^T ( 68 ) 

v£S k WEE* y ' 

one obtains 

ff*(P) + J t fe (-P) = H k+t {P) = I k {P) +H\Po T- k ) (69) 

where (*) follows from a well known and elementary theorem (e.g. [ 12 1, p. 22, theorem 
2.1) and the second equation is obvious. Because of 

< J k (P) < -H k (PoT- f ')S < - log \£ k \ — ► (70) 

t t t^oo 

and 

< I^{X) < -H k {X) < - log \S k \ — 0, (71) 

t t t— >oo 

the assertion follows from an application of the sandwich theorem. o 
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